Univariate - MA Data Analysis

Univariable —

Open days

##   obs_days open_days closed_days
## 1      169         8         161
## # A tibble: 2 × 3
##   is_closed     n  prop
##   <lgl>     <int> <dbl>
## 1 FALSE       161  95.3
## 2 TRUE          8   4.7

Basic Summary of Dependent Variables

## # A tibble: 4 × 13
##   variable        n   min   max median    q1    q3   iqr   mad  mean    sd    se
##   <fct>       <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 food_loss_…   161     0 13.8    7.35  6.7   8.4   1.7  1.11  7.83  2.17  0.171
## 2 food_waste…   161     0  6.55   2.1   1.1   2.95  1.85 1.33  2.19  1.40  0.111
## 3 liquid_was…   161     0  4.5    1.5   0.65  2.05  1.4  1.04  1.48  0.995 0.078
## 4 solid_wast…   161     0  2.95   0.65  0.35  0.95  0.6  0.445 0.708 0.499 0.039
## # ℹ 1 more variable: ci <dbl>

Histograms —

X Histogram with density

#### X Q-Q plot

X shapiro test

## # A tibble: 3 × 3
##   variable        statistic             p
##   <chr>               <dbl>         <dbl>
## 1 food_waste_kg       0.952 0.0000260    
## 2 liquid_waste_kg     0.951 0.0000192    
## 3 solid_waste_kg      0.903 0.00000000783

From the output, all the p-value is far less than 0.05; so implying that the distribution of the data are significantly different from normal distribution. In other words, we can not assume the normality.

Histogram Food Waste per customer

Q-Q plot Food Waste per customer

shapiro test for per customer

## # A tibble: 3 × 3
##   variable          statistic        p
##   <chr>                 <dbl>    <dbl>
## 1 food_waste_p_kg       0.987 1.38e- 1
## 2 liquid_waste_p_kg     0.984 6.10e- 2
## 3 solid_waste_p_kg      0.863 6.24e-11

From the output, the p-value of solid food waste per customer is far less that the significant level of 0.05; but the others are not. So it imply that the distribution of the data for solid food waste per customer is significantly different from normal distribution. In other words, we can assume the normality for food waste and liquid food waste per customer but not for solid food waste.

Time Series Plots —

Daily Time Series

Daily plot per customer

Decompsiotion

## 
##  Fitting models using approximations to speed things up...
## 
##  ARIMA(2,0,2) with non-zero mean : 595.2761
##  ARIMA(0,0,0) with non-zero mean : 607.2775
##  ARIMA(1,0,0) with non-zero mean : 598.3493
##  ARIMA(0,0,1) with non-zero mean : 606.2906
##  ARIMA(0,0,0) with zero mean     : 795.7987
##  ARIMA(1,0,2) with non-zero mean : 593.7226
##  ARIMA(0,0,2) with non-zero mean : 603.5818
##  ARIMA(1,0,1) with non-zero mean : 598.3892
##  ARIMA(1,0,3) with non-zero mean : 594.7845
##  ARIMA(0,0,3) with non-zero mean : 602.7266
##  ARIMA(2,0,1) with non-zero mean : 593.1346
##  ARIMA(2,0,0) with non-zero mean : 593.03
##  ARIMA(3,0,0) with non-zero mean : 591.0829
##  ARIMA(4,0,0) with non-zero mean : 593.9004
##  ARIMA(3,0,1) with non-zero mean : 593.1032
##  ARIMA(4,0,1) with non-zero mean : 594.6705
##  ARIMA(3,0,0) with zero mean     : 655.5828
## 
##  Now re-fitting the best model(s) without approximations...
## 
##  ARIMA(3,0,0) with non-zero mean : 600.6932
## 
##  Best model: ARIMA(3,0,0) with non-zero mean
## Series: df$food_waste_kg 
## ARIMA(3,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1      ar2      ar3    mean
##       0.1053  -0.2083  -0.1262  2.0746
## s.e.  0.0788   0.0769   0.0786  0.0871
## 
## sigma^2 = 1.97:  log likelihood = -295.16
## AIC=600.33   AICc=600.69   BIC=615.97
## 
##  Fitting models using approximations to speed things up...
## 
##  ARIMA(2,0,2) with non-zero mean : 242.2204
##  ARIMA(0,0,0) with non-zero mean : 254.9591
##  ARIMA(1,0,0) with non-zero mean : 242.9804
##  ARIMA(0,0,1) with non-zero mean : 254.9337
##  ARIMA(0,0,0) with zero mean     : 424.4576
##  ARIMA(1,0,2) with non-zero mean : 240.5345
##  ARIMA(0,0,2) with non-zero mean : 253.0456
##  ARIMA(1,0,1) with non-zero mean : 242.4608
##  ARIMA(1,0,3) with non-zero mean : 241.1252
##  ARIMA(0,0,3) with non-zero mean : 252.9766
##  ARIMA(2,0,1) with non-zero mean : 240.7382
##  ARIMA(2,0,3) with non-zero mean : 243.1306
##  ARIMA(1,0,2) with zero mean     : 290.294
## 
##  Now re-fitting the best model(s) without approximations...
## 
##  ARIMA(1,0,2) with non-zero mean : 252.8433
## 
##  Best model: ARIMA(1,0,2) with non-zero mean
## Series: df$solid_waste_kg 
## ARIMA(1,0,2) with non-zero mean 
## 
## Coefficients:
##          ar1      ma1      ma2    mean
##       0.3933  -0.3011  -0.2195  0.6723
## s.e.  0.2334   0.2269   0.0728  0.0303
## 
## sigma^2 = 0.2516:  log likelihood = -121.24
## AIC=252.48   AICc=252.84   BIC=268.12
## 
##  Fitting models using approximations to speed things up...
## 
##  ARIMA(2,0,2) with non-zero mean : 481.848
##  ARIMA(0,0,0) with non-zero mean : 489.7931
##  ARIMA(1,0,0) with non-zero mean : 483.6428
##  ARIMA(0,0,1) with non-zero mean : 488.6056
##  ARIMA(0,0,0) with zero mean     : 668.5145
##  ARIMA(1,0,2) with non-zero mean : 481.4292
##  ARIMA(0,0,2) with non-zero mean : 487.558
##  ARIMA(1,0,1) with non-zero mean : 484.5832
##  ARIMA(1,0,3) with non-zero mean : 482.8695
##  ARIMA(0,0,3) with non-zero mean : 487.0004
##  ARIMA(2,0,1) with non-zero mean : 480.5155
##  ARIMA(2,0,0) with non-zero mean : 480.0232
##  ARIMA(3,0,0) with non-zero mean : 478.3711
##  ARIMA(4,0,0) with non-zero mean : 480.7297
##  ARIMA(3,0,1) with non-zero mean : 480.1401
##  ARIMA(4,0,1) with non-zero mean : 479.0072
##  ARIMA(3,0,0) with zero mean     : 539.5893
## 
##  Now re-fitting the best model(s) without approximations...
## 
##  ARIMA(3,0,0) with non-zero mean : 484.9027
## 
##  Best model: ARIMA(3,0,0) with non-zero mean
## Series: df$liquid_waste_kg 
## ARIMA(3,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1      ar2     ar3    mean
##       0.1128  -0.1804  -0.124  1.4030
## s.e.  0.0780   0.0767   0.078  0.0638
## 
## sigma^2 = 0.9932:  log likelihood = -237.27
## AIC=484.53   AICc=484.9   BIC=500.18

Boxplots - weekly

Boxplots per customer - weekly

bar plot - weekly

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning in stat_summary(fun = mean, geom = "bar", shape = 16, size = 3): Ignoring unknown parameters: `shape`
## Ignoring unknown parameters: `shape`
## Ignoring unknown parameters: `shape`

Boxplot - monthly

## Boxplot per customer - monthly

Time Series Plots for Independents

## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'

(Partial and) Autocorrelation Function

Spectral Analysis

## [1] 3.214286
## [1] 5.294118
## [1] 5.142857
## [1] 5.294118

roughly 6 (days) period for food waste, but food loss is approx. 3 days or 20 days cycle.